Back

Quantitative Biology

Wiley

Preprints posted in the last 30 days, ranked by how well they match Quantitative Biology's content profile, based on 11 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
TopoFuseNet: Hierarchical Graph Representation Learning with Multi-Scale Topological Features for Accurate Drug Synergy Prediction

Wang, Q.; Shi, x.

2026-05-08 bioinformatics 10.64898/2026.05.05.722940 medRxiv
Top 0.4%
1.2%
Show abstract

Accurate prediction of drug synergy is paramount for developing effective combination therapies and advancing personalized medicine. Although methods based on graph neural networks (GNNs) have become a prevalent approach, they often treat molecules as flat graphs of connected atoms, thus overlooking their inherent hierarchical structure (i.e., atoms forming functional groups) and the critical topological information that governs molecular interactions. To address this limitation, we introduce TopoFuseNet, a novel hierarchical graph representation learning framework that integrates multi-scale topological features. The core innovations of TopoFuseNet include: 1) The first-ever application of "Group Centrality" from network science to cheminformatics, enabling the identification and quantification of functional groups crucial to drug activity; 2) A systematic, multi- path strategy to seamlessly integrate node-level (atom) and group-level (functional group) topological features into a Graph Attention Network (GAT) via feature augmentation, attention biasing, and hierarchical pooling; 3) A Differential Transformer module to deeply fuse multi-modal features learned from sequences, fingerprints, and our proposed hierarchical graph representations. Extensive experiments on two large-scale benchmark datasets, DrugComb and DrugCombDB, demonstrate that TopoFuseNet significantly outperforms state-of-the-art methods across multiple key metrics, including AUC, AUPRC, and F1-score, while exhibiting exceptional generalization robustness under various stringent cold-start scenarios. In-depth ablation studies further confirm the effectiveness and necessity of each proposed innovative module. Furthermore, multi-scale interpretability analysis and zero-shot cross-domain transfer experiments reveal that the model successfully captures molecular interaction rules with clear pharmacological significance, demonstrating immense practical potential for discovering novel combination therapies through large-scale virtual screening. Our work not only delivers a superior model for drug synergy prediction, but more importantly, it establishes a novel and scalable paradigm for effectively integrating hierarchical molecular structures and topological information into GNNs.

2
Bayesian-Steered Structure Prediction of Mechanical Biomolecules Using Twisted Diffusion

Klaus, C.; Sotomayor, M.

2026-05-13 bioinformatics 10.64898/2026.05.11.724187 medRxiv
Top 0.5%
0.9%
Show abstract

Deep learning approaches have revolutionized protein structure prediction. These tools are trained using experimental data and recapitulate reported conformations, but there is great interest in predicting conformations that may be functionally relevant although experimentally underrepresented. Since many modern structure prediction tools use generative artificial intelligence diffusion models, we reframe the search for alternative molecular conformations as that of sampling from a diffusion distribution conditioned using any arbitrary Bayesian likelihood. We implement a twisted diffusion sampler in Boltz-2 to sample this conditioned distribution and demonstrate the utility of this approach, which does not require any additional training of the neural network, by implementing a diffusion analog of steered molecular dynamics simulations applied to mechanical systems. We can reproduce predicted stretched states of fragments of DNA, the muscle protein titin, and the inner-ear protocadherin-15 protein, as well as open states of the MscL ion channel consistent with experimental results. We expect that steered structure predictions will help sample underrepresented and non-equilibrium conformations for many macromolecular systems.

3
Cholesteryl Esters Modulate Lipid Droplet Rigidity and Monolayer Organization during Liver Cancer Progression

Campbell, O.; Leal, C.; Monje, V.

2026-05-05 biophysics 10.64898/2026.05.01.722229 medRxiv
Top 0.8%
0.7%
Show abstract

In mammalian cells, lipid monolayers support the integrity of lipid droplets (LDs), organelles that function as storage for neutral lipids. Liver-targeting illnesses such as liver cancer interrupt normal LD metabolism and prompt changes in the chemical content of these organelles, which can have effects on structural and organizational behavior of the lipids. In LDs, liver cancer induces concentric crystalline phases of cholesteryl esters (CEs) and triglycerides near the NL-monolayer interface, which become more pronounced as CE concentration increases. Yet, there is little known about how this phenomenon may link to persistence of undigested LDs in liver cancer patients. To shed light on this, all-atom molecular dynamics simulations were used to model LD micropipette aspiration experiments and gain insight into the effect of CE concentration on partitioning, structural, and mechanical properties of LDs. We successfully model micropipette aspiration by application of constant surface tension laterally, which stretches lipid bilayers and monolayers as the magnitude increased. The results show increased phospholipid packing due to insertion of CE fatty tails into the monolayer. Increasing CE concentration induces a non-linear change in surface packing defects on the LDs, notable rigidification, and stiffness. Taken together, these insights improve our understanding of the physical properties at the LD monolayer-core interface during liver cancer progression.

4
Promises and limitations of local ancestry inference in imputed ancient genomes

Bougiouri, K.; Irving-Pease, E. K.; Frantz, L. A. F.; Racimo, F.; Petr, M.

2026-05-20 evolutionary biology 10.64898/2026.05.19.725905 medRxiv
Top 0.8%
0.7%
Show abstract

Recent advances in genome imputation have enabled the application of state-of-the-art statistical methods--originally developed for present-day genomes--to ancient genomes. One class of such methods, known as local ancestry inference (LAI), can model an individuals genome as a mosaic of tracts assigned to different putative ancestral sources, revealing patterns of genetic ancestry across the genome. However, most LAI methods have been designed to study recent admixture events in human history, and they generally assume large panels of present-day genomes. Despite the recent availability of high-quality imputed ancient genomes, it remains unknown to what degree LAI inference is reliable for such datasets. Ancient DNA is often characterized by heterogeneous geographic and temporal sampling, varying degrees of divergence between ancient source proxies and admixing populations, and complex demographic histories. Here, we performed an extensive set of population genetic simulations to evaluate the accuracy of four popular LAI methods-RFMix, FLARE, MOSAIC and simpLAI-under different demographic scenarios, various temporal sampling schemes, sample sizes, and admixture dates. We quantify the accuracy of these methods as a function of different parameters in practically relevant scenarios, and provide general guidelines for future studies utilizing LAI in ancient DNA research.

5
Elasticity of a three-dimensional cell vertex model of epithelia

Terada, K.; Kondo, Y.

2026-05-18 biophysics 10.64898/2026.05.15.725329 medRxiv
Top 0.8%
0.7%
Show abstract

Mechanical properties of epithelial tissues play essential roles in morphogenesis and physiological function. In this study, we analytically derived the in-plane bulk modulus, shear modulus, and Poissons ratio of a three-dimensional cell vertex model of epithelial monolayers. We showed that the model can robustly reproduce a near-zero in-plane Poissons ratio, a mechanical feature reported in cultured epithelial tissues. Numerical simulations further confirmed that the theoretically predicted Poissons ratio accurately describes the response of the model under finite, biologically relevant strains. In addition, the model exhibits not only morphological bistability between squamous-like and columnar-like states, but also mechanical bistability characterized by distinct elastic responses. Together, these results provide a minimal three-dimensional framework that links cell-scale mechanical interactions and epithelial morphology to tissue-scale elastic properties.

6
Efficient Bayesian inference for ordinary differential equation models from experimental data with uncertain measurement times

Vanhoefer, J.; Nakonecnij, V.; Binder, N.; Hasenauer, J.

2026-05-13 systems biology 10.64898/2026.05.09.724053 medRxiv
Top 0.8%
0.7%
Show abstract

Time-resolved measurements are central to calibrating mechanistic dynamical models, but current inference frameworks typically assume that reported measurement times are exact. In practice, actual sampling times may deviate from reported times because of sample-handling delays, imper-fect synchronization, or reporting errors. Here, we present a Bayesian framework for parameter inference in ordinary differential equation models that explicitly accounts for uncertainty in measurement times. We formulate latent measurement times as random variables and derive a joint and marginalized posterior. To compute the marginal likelihood efficiently, we augment the original dynamical system with additional state variables that evaluate the required integrals during numerical simulation. This reduces the dimensionality of the estimation problems and allows for efficient and reliable Markov chain Monte Carlo sampling. Across synthetic examples and a published model of carotenoid cleavage in Arabidopsis thaliana, neglecting time uncertainty led to biased estimates and overconfident uncertainty quantification, whereas the proposed marginalized formulation recovered reliable parameter estimates while substantially improving sampling efficiency and scalability. These results identify measurement time uncertainty as an important source of variability in dynamic modeling and establish posterior marginalization as a practical strategy for robust mechanistic inference.

7
Stereochemistry-Aware Drug-Target Affinity Prediction

Ferreyra, S.; Dutra, I.; Galeano, A.; Paccanaro, A.

2026-05-18 bioinformatics 10.64898/2026.05.14.725200 medRxiv
Top 0.9%
0.6%
Show abstract

Drug-target affinity (DTA) prediction is a key task in drug discovery, enabling the estimation of the interaction strength between candidate compounds and biological targets. However, current models rely on connectivity-based molecular representations and do not explicitly account for the spatial organization, also known as stereochemistry. This limitation becomes evident when considering chirality, where a drug can exist as enantiomers, i.e., molecules that share the same atoms and bonds but differ in their three-dimensional arrangement. Despite their chemical similarity, they can interact differently with the same target, leading to variations in binding affinity and biological activity. In this paper, we propose a stereochemistry-aware DTA prediction framework that incorporates this information into molecular representations. Drug representations are learned from chemical structure using a directed-bond message passing graph neural network that captures enantiomers configurations, while protein targets are represented through sequence-based embeddings. Experiments on the Davis dataset demonstrate that our model can improve affinity prediction. Importantly, a case study on a manually curated dataset of enantiomers with different biological action shows that the model is able to distinguish the affinities in the two forms consistent with their experimentally observed biological activity. These findings support the relevance of stereochemistry-aware molecular representation for more accurate and chemically faithful DTA prediction.

8
A Multimodal Neural Network Model for Early Recurrence Prediction in Lung Adenocarcinoma

Patricoski-Chavez, J. A.; Hayek, K.; Singh, R.; Azzoli, C. G.; Warner, J. L.; Gamsiz Uzun, E. D.

2026-05-18 bioinformatics 10.64898/2026.05.14.725244 medRxiv
Top 1.0%
0.5%
Show abstract

Lung adenocarcinoma (LUAD), a subtype of non-small cell lung cancer (NSCLC), is the most common primary lung cancer worldwide. Despite advancements in early detection and treatment, up to 39% of patients develop recurrent tumors following complete resection. Currently, no widely available models exist for reliably predicting early recurrence of LUAD, which is a significant prognostic factor of post-recurrence survival. Models leveraging deep learning (DL) techniques have demonstrated notable utility in cancer recurrence prediction, particularly when used in combination with both clinical and genomic data. We developed a DL-based model, Predicting Lung Adenocarcinoma recurrence via Selective Multimodal Attention (PLASMA), to predict early recurrence using clinical, mRNA expression, and mutation data from patients with primary stage I-III LUAD. Trained on The Cancer Genome Atlas (TCGA) dataset, PLASMA outperformed traditional machine learning models in predicting early recurrence in both the TCGA test set and an external validation set (TRACERx Lung), achieving area under the receiver operating characteristic curve (AUROC) scores of 85.0% and 76.5%, respectively. Our results support the potential of multimodal DL for early LUAD recurrence prediction and risk stratification.

9
Machine learning-based Personalized Dietary Recommendations to Achieve Desired Gut Microbial Compositions

Wang, X.-W.; Huang, D.; Yu, P.; Weiss, S.; Liu, Y.-Y.

2026-05-15 bioinformatics 10.64898/2026.05.12.724618 medRxiv
Top 1%
0.5%
Show abstract

Dietary intervention is an effective way to alter the gut microbiome to promote human health. Yet, due to our limited knowledge of diet-microbe interactions and the highly personalized gut microbial compositions, an efficient method to prescribe personalized dietary recommendations to achieve desired gut microbial compositions is still lacking. Here, we propose a machine learning framework to resolve this challenge. Our key idea is to implicitly learn the diet-microbe interactions by training a machine learning model using paired gut microbiome and dietary intake data from a population-level cohort. The well-trained machine learning model enables us to predict the microbial composition of any given species collection and dietary intake. Next, we prescribe personalized dietary recommendations by solving an optimization problem to achieve the desired microbial compositions. We systematically validated this Machine learning-based Personalized Dietary Recommendation (MPDR) framework using synthetic data generated from an established microbial consumer-resource model. We then validated MPDR using real data collected from a diet-microbiome association study. The presented MPDR framework demonstrates the potential of machine learning for personalized nutrition.

10
From time-course expression to gene regulation: direct linear ODE inference without finite-difference approximation

Huang, X.; Ang, A.; Vasoya, A. P.; Wang, Y.; Teresa, P.

2026-05-20 systems biology 10.64898/2026.05.18.726023 medRxiv
Top 1%
0.5%
Show abstract

Inferring gene regulation from time-course expression profiles is essential for understanding how cells transition between states during development, differentiation, and disease progression. Existing approaches often model expression dynamics with ordinary differential equations (ODEs). However, due to the computational complexity of directly solving these ODE models, most methods rely on finite-difference approximations of temporal derivatives, which can amplify measurement noise, introduce discretization bias, and lead to unstable or biased parameter estimates. To fill this gap, we develop the first computational method to directly learn a linear ODE model for gene regulation inference without relying on finite-difference approximations. We first formulate an optimization problem that directly exploits the closed-form solution of the linear ODE system. We then solve this problem via gradient descent, deriving analytical gradients with respect to the model parameters; these gradients involve matrix exponentials and integrals, which are challenging to directly compute. To make the computation efficient, we further use high-order Taylor approximations of the gradients whose truncation error is on the order of machine precision. In addition, we establish theoretical results demonstrating an inherent, non-vanishing gap between our exact solution and solutions derived from finite-difference approximations, which underscores the theoretical advantages of our approach. Finally, we demonstrate that our method consistently outperforms competing approaches on both simulated data and real-world scRNA-seq datasets in terms of AUROC. Our source codes can be accessed here: https://github.com/EJIUB/ExactLinearODE

11
Learning Chirality-Aware Representations to Predict Drug Side Effect Frequencies

Galeano, A.; Dutra, I.; Ferreyra, S.; Paccanaro, A.

2026-05-18 bioinformatics 10.64898/2026.05.14.725209 medRxiv
Top 1%
0.5%
Show abstract

Ab initio prediction of side effect frequencies is important for assessing the risk-benefit profile of drugs and for identifying potential adverse effects early in development. A key challenge is chirality: many drugs exist as enantiomers, pairs of molecules with the same atoms and bond connectivity but different three-dimensional arrangements. Although chemically similar, enantiomers can interact differently with biological targets and therefore exhibit distinct efficacy and adverse-effect profiles. Here we introduce F2S (Features to Signatures), a method to predict the frequencies of drug side effects while explicitly accounting for chirality. Drug representations are learned directly from chemical structure using a directed-bond message-passing graph neural network that captures stereochemical configurations. Side effect representations are derived from curated textual descriptions encoded with a frozen PubMedBERT model. Side effect frequencies are predicted from the dot product between drug and side effect signatures together with biases for drugs and side effects. We evaluated F2S extensively across multiple settings, including cold-start and warm-start prediction, prospective evaluation, and scenarios controlling for chemical similarity between training and test drugs. Across these evaluations, F2S achieves performance comparable to state-of-the-art methods for general side-effect frequency prediction while producing fewer false positives and substantially improves the prediction of frequency differences between enantiomer pairs. Finally, F2S learns compact 10-dimensional signatures that support interpretability: drug signatures reflect therapeutic class and shared targets, side-effect signatures capture phenotype similarity, and the learned bias terms correlate with the popularity of drugs and side effects.

12
Spatiotemporal Modeling of GPCR Signaling: The Role of Endosomal Dynamics and Receptor Recycling

Weckel, C.; Gourdon, J.; Darrigade, L.; Jugnarain, V.; Crepieux, P.; Reiter, E.; Jean-Alphonse, F.; Haar, S.; Yvinec, R.

2026-05-04 systems biology 10.64898/2026.04.29.721559 medRxiv
Top 1%
0.5%
Show abstract

Cells communicate via extracellular ligands, such as hormones, which bind to plasma membrane receptors and trigger intracellular signaling cascades. G Protein-Coupled Receptors (GPCRs) exemplify this mechanism by initiating signaling both at the cell surface and, from intracellular compartments such as endosomes. The kinetics and spatial localization of these signals are critical determinants of cellular responses, yet receptor trafficking-including internalization, endosomal sorting, and recycling-remains a pivotal but often overlooked component of theoretical GPCR models. In this study, we present a mathematical framework that integrates receptor trafficking and signaling compartmentalization into generic GPCR dynamic models. Using a compartmentalized approach based on systems of ordinary differential equations (Chemical Reaction Networks), we analyze how receptor internalization and recycling modulate ligand-induced responses. Our results show that the balance between plasma membrane and endosomal signaling can significantly enhance or diminish ligand efficacy. Calibrated with high-throughput kinetic data, our model offers a refined tool for ligand pharmacological characterization and advances the understanding of GPCR signaling spatial organization.

13
A community machine learning challenge to predict the effects of gene perturbations on T cell differentiation for cancer immunotherapy

Zhang, J.; Schwartz, M. A.; Mutaher, M.; Olajide, O.; Pritykin, Y.; Ashenberg, O.; Hacohen, N.; Uhler, C.

2026-05-22 bioinformatics 10.64898/2026.05.21.726863 medRxiv
Top 1%
0.5%
Show abstract

Perturbations of genes with functional importance in T cells could be used to change the distribution of CD8 T cell states to enhance anti-tumor functions for cancer immunotherapies. We launched a world-wide computational challenge to predict the effects of gene perturbations and to devise objective functions for prioritizing gene perturbations that lead to desired T-cell state distributions. We supported the challenge by generating a single-cell Perturb-seq dataset profiling the effect of knocking out 73 individual expert-defined genes in T cells transferred into a mouse melanoma model. We compared the top algorithms developed by participants, and found that performance was primarily determined by the prior data used for gene feature representation, with perturbational data derived features, proving most effective. Experimental validation of the top 61 genes nominated by the algorithms revealed that perturbation of Ndufv2 and Dimt1 reached the defined objective and biased T cell differentiation toward desired states.

14
Dual-Stream Compression of High Bit-Depth Medical Images with Application to DNA Storage

Su, H.; Fan, W.; Peng, J.; Zhang, Y.

2026-05-20 bioinformatics 10.64898/2026.05.17.724501 medRxiv
Top 1%
0.5%
Show abstract

High bit-depth medical images preserve subtle intensity variations that are important for quantitative analysis and clinical interpretation, but their large dynamic range poses challenges for efficient compression. We propose a bit-plane-aware dual-stream compression framework for 16-bit medical images by separately modeling the most significant bit (MSB) and least significant bit (LSB) components. The MSB structural stream is encoded using JPEG coding with a Duplicate Segment Skipping (DSS) strategy to exploit spatial and segment-level redundancy, while the LSB detail stream is compressed using learned image compression to represent residual variations and fine-grained details. Experiments on four MRI and CT datasets show that the proposed method consistently outperforms representative traditional and learning-based codecs, achieving the lowest bit rate across all datasets. Meanwhile, it preserves high reconstruction fidelity. As a downstream application, we further demonstrate that the compressed bitstreams can be effectively integrated with DNA encoding and converted into sequences with favorable biochemical properties.

15
Deep learning models for chemical perturbation prediction do not yet utilise drug molecular features

Bai, J.; Prince, S.; Nitschke, G. S.

2026-05-15 bioinformatics 10.64898/2026.05.13.724458 medRxiv
Top 1%
0.4%
Show abstract

Recent deep learning models for L1000 chemical perturbation prediction incorporate dedicated drug molecular encoders. We retrained seven such models from scratch with zeroed or shuffled drug inputs, and compared them with a multilayer perceptron that uses only cell-line basal expression. Under drug-blind evaluation, ablation caused negligible performance changes and the drug-free baseline matched all models. Current architectures do not yet utilise drug molecular features for generalisation to unseen compounds.

16
From naive to foundation: benchmarking models for epidemic forecasting

Wang, D.; Li, Y.; Perra, N.

2026-05-13 epidemiology 10.64898/2026.05.11.26352889 medRxiv
Top 1%
0.4%
Show abstract

We systematically evaluate and compare the performance of classical statistical methods (ARIMA), mechanistic compartmental models (SEIR), modern deep learning architectures (LSTM, DLinear, Autoformer), and an emerging time-series foundation model (TabPFN-TS) to forecasts the incidence of Influenza-Like Illness (ILI) across nine European countries. The models are benchmarked against a naive baseline and a multi-model ensemble (RespiCast) created by an initiative of the ECDC. In line with the operational practice of existing forecasting hubs, our entire evaluation is explicitly optimized for short-term horizons (1 to 4 weeks ahead). Interestingly, we found that the foundation model TabPFN-TS allows for great zero-shot inference capabilities. Without any task-specific retraining, it successfully overcomes extreme data scarcity to consistently outperform all other individual architectures, frequently rivalling or surpassing the RespiCast ensemble. Our results highlight how deep learning architectures are severely constrained by extreme data scarcity, typical in epidemic forecasting, requiring targeted endogenous data augmentation to reduce predictive errors. Within the deep learning class of models, we observe that simpler architectures (such as DLinear and LSTM) frequently exhibit greater robustness and outperform complex, attention-based models (such as Autoformer) when data is constrained. Finally, our results show how a weighted ensemble, constructed by fusing all the models, delivers highly robust forecasts in all regions considered. Overall, our findings showcase the transformative potential of zero-shot foundation models in epidemic forecasting and confirm the importance of multi-model ensembles.

17
S-IGTD: supervised tabular-to-image topology learning via between-group correlation for multiclass classification of biological data

WU, H.-M.

2026-05-21 bioinformatics 10.64898/2026.05.19.726105 medRxiv
Top 1%
0.4%
Show abstract

MotivationTabular-to-image methods allow convolutional neural network (CNN)-based classifiers to analyse high-dimensional biological tables by mapping features onto a two-dimensional grid. Existing layouts are usually driven by unsupervised global correlation, which can place class-discriminative features far apart when nuisance or housekeeping covariation dominates the total covariance structure. ResultsWe present the Supervised Image Generator for Tabular Data (S-IGTD), a supervised extension of IGTD that optimizes tabular-to-image topology by replacing total-correlation distance with one minus the absolute between-group correlation, computed from class-wise feature means, under the Within-And-Between-Analysis (WABA) decomposition. We prove entrywise consistency of the supervised distance matrix under standard moment conditions and identify balanced-class settings in which S-IGTD improves a Signal Dispersion Score (SDS)-related topology objective. In controlled simulations targeting between-group signal, S-IGTD outperformed Euclidean- and correlation-distance IGTD variants in SDS, accuracy and macro-F1 score. Across five biological benchmarks ranging from 4- to 91-class classification, S-IGTD produced compact class-supervised layouts, with 24/35 Holm-adjusted significant SDS wins against seven non-reference layout controls. As a secondary downstream diagnostic, a CNN with batch normalization showed higher mean accuracy than random layouts and correlation-distance IGTD on all real datasets, and higher mean accuracy than Euclidean-distance IGTD on four of five datasets, with the clearest gains on large multiclass cancer and methylation benchmarks. Availability and implementationSource code, datasets, configuration files and reproducibility scripts are freely available at https://github.com/hanmingwu1103/S-IGTD. Contactwuhm@g.nccu.edu.tw

18
Benchmarking generative AI and physics based molecular simulation for sampling conformational heterogeneity in T4 Lysozyme

Bhakat, S.

2026-05-13 biophysics 10.64898/2026.05.10.724101 medRxiv
Top 1%
0.3%
Show abstract

Wild-type T4 lysozyme (T4L) is used as a benchmark to evaluate conformational sampling across generative AI, AI-accelerated molecular simulation (AMS), and physics-based enhanced molecular dynamics (EMD). A four-state model: exposed/open, exposed/closed, buried/open, and buried/closed; is defined using physically meaningful collective variables. While generative AI methods (AF-cluster, MSA subsampling of AlphaFold2, ConforFold, AlphaFlow, ESMFlow, ConfRover, BioEmu) largely sample only the exposed/open state, AMS integrating generative ensembles with iterative molecular dynamics, recovering all states and reproducing equilibrium populations similar to EMD and experimental smFRET signatures.

19
Advancing in silico drug design with Bayesian refinement of AlphaFold models

Sen, S.; Hoff, S. E.; Morozova, T. I.; Schnapka, V.; Bonomi, M.

2026-05-06 bioinformatics 10.1101/2025.06.25.661454 medRxiv
Top 1%
0.3%
Show abstract

Virtual screening has become an indispensable tool in modern structure-based drug discovery, enabling the identification of candidate molecules by computationally evaluating their potential to bind target proteins. The accuracy of such screenings critically depends on the quality of the target structures employed. Recent advances in protein structure prediction, particularly AlphaFold2, have revolutionized this field with unprecedented accuracy. However, AlphaFold2 models often exhibit limitations in local structural details, especially within binding pockets, which limit their utility for small molecule docking. In contrast, molecular dynamics simulations with accurate atomistic force fields can refine protein structures, but lack the ability to leverage the structural information provided by deep learning approaches. Here, we introduce bAIes, an integrative method that bridges this gap by combining physics-based force fields with data-driven predictions through Bayesian inference. Crucially, bAIes demonstrates a superior ability to discriminate between binders and non-binders in virtual screening campaigns, outperforming both AlphaFold2 and molecular dynamics-refined models. By enhancing the usability of AlphaFold2 models without requiring extensive experimental or computational resources, bAIes offers a convenient solution to a longstanding challenge in structure-based drug design, potentially accelerating the early phases of drug discovery.

20
Structure and Dynamics of the HIV-1 Envelope Protein on the Virion Envelope

Majumder, A.; Dutta, M.; Cherek, L.; Voth, G. A.

2026-05-18 biophysics 10.64898/2026.05.18.725998 medRxiv
Top 1%
0.3%
Show abstract

HIV-1 buds from infected cells as immature virion particles with a scattered envelope glycoprotein (Env) distribution on their envelope. It then undergoes maturation, during which the viral protease cleaves the Gag polyprotein at multiple sites, leading to structural reorganization of the viral particle and lateral redistribution of Env proteins, ultimately rendering the virion infectious. However, the underlying mechanism of maturation-induced Env reorganization remains elusive. In this study, we combine microsecond-long all-atom (AA), bottom-up coarse-grained (CG) molecular dynamics simulations, and diffusion model-based backmapping to investigate the structural organization and key interactions of Env in viral membranes. AA simulations of fully glycosylated Env embedded in HIV-1 mimetic asymmetric bilayers were first performed to characterize its conformational dynamics and Env-lipid interactions. We then developed a bottom-up CG model of glycosylated Env from that AA data and simulated the mature HIV-1 virion envelope containing multiple Env proteins. The CG simulations predict that Env proteins form clusters through interactions mediated by the cytoplasmic tail domain (CTD) and adopt diverse tilted conformations within these clusters. These CG simulations were then backmapped to AA resolution and further AA simulations were carried out to identify, in detail, the specific interacting residues in the Env clusters. Additionally, analysis of epitope accessibility shows that broadly neutralizing antibodies (bnAbs) targeting the V1/V2 and V3 loops may efficiently interact with Env clusters on the mature virion surface. Together, these results provide a molecular mechanism for Env oligomerization during viral maturation and offer new insights into the accessibility of bnAb epitopes on Env clusters.